For Linear Models, target $Y$ (a number or a vector) are approximated by $$f(X) = W \cdot \hat{X} $$ where $\hat{X}=
\begin{pmatrix}1\\X \end{pmatrix}$.
A natural way to extend the linear model is to take the composition of a linear model with a non-linear function
$\sigma$,
$$g(X) =\sigma (W \cdot \hat{X}) $$ Here, $\sigma$ applies entry-wisely to the vector $W
\cdot \hat{X}$.
This extension gives us a single-layer neural network, the main goal is to approximate the weights matrix $W$
in the training process.
A multilayer neural network model with $k$ hidden layers is of the form
$$X \mapsto (g_k \circ \cdots \circ g_2\circ g_1\circ g_0)(X),\qquad k \in \{0,1,2,3,4,5,\cdots\} $$
The weighs matrices $W_i$ associated to each layer are approximated in the training process.
Sharpening Kernel
$$\begin{pmatrix}0&-1&0\\-1&5&-1\\0&-1&0\end{pmatrix}
=\begin{pmatrix}0&0&0\\0&1&0\\0&0&0\end{pmatrix} + \begin{pmatrix}0&-1&0\\-1&4&-1\\0&-1&0\end{pmatrix}$$
The kernel amplifies the difference between adjacent pixels.